Cepstral domain voice activity detection for improved noise modeling in MMSE feature enhancement for ASR
نویسندگان
چکیده
In this paper we investigate the use of voice activity detection (VAD) for improving noise models used for cepstral domain minimum mean squared error (MMSE) filtering of noisy speech. Due to the popularity of MFCC features for speech recognition, it is useful to have VAD methods and MMSE filtering algorithms that both work in the MFCC domain. We propose a method for VAD based on the likelihood ratio test (LRT) that works directly on MFCC feature vectors. Detected noiseonly frames are collected and used for creating a noise model which is then used for MMSE filtering. Finally, speech recognition is run using models trained in clean conditions. Experiments on AURORA2 show that our approach is successful in improving the noise model compared to the common approach of simply using the first few frames of each file for noise modeling, and that the proposed VAD method has performance comparable to a well-known LRT-based VAD algorithm that works in the DFT domain.
منابع مشابه
Kalman and unscented kalman filter feature enhancement for noise robust ASR
Model-based feature enhancement is an ASR front-end technique to increase the robustness of the recogniser in noisy environments. However, its MMSE-estimates of the clean speech feature vectors are based only on the static components at the current frame. In this paper, we show how the Kalman filter framework can be seen as a natural extension that incorporates both the current and the previous...
متن کاملSpeech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering
Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...
متن کاملNoise Robustness of Traditional Features for Macedonian Voice Dialing ASR
Automatic Speech Recognition Systems of today are intensely deployed in real world application scenarios which are often characterized by suboptimal operating conditions. Thus their noise robustness has become a crucial parameter when assessing ASR in-field performance. The paper examines the noise robustness of traditional ASR feature sets as applied to a Voice Dialing Application built for Ma...
متن کاملRobust Speech Recognition using Model
Maintaining a high level of robustness for Automatic Speech Recognition (ASR) systems is especially challenging when the background noise has a time-varying nature. We have implemented a Model-Based Feature Enhancement (MBFE) technique that not only can easily be embedded in the feature extraction module of a recogniser, but also is intrinsically suited for the removal of non-stationary additiv...
متن کاملMfcc and Cmn Based Speaker Recognition in Noisy Environment
The performance of automatic speaker recognition (ASR) system degrades drastically in the presence of noise and other distortions, especially when there is a noise level mismatch between the training and testing environments. This paper explores the problem of speaker recognition in noisy conditions, assuming that speech signals are corrupted by noise. A major problem of most speaker recognitio...
متن کامل